Incorporating automatic speech recognition methods into the transcription of police-suspect interviews: factors affecting automatic performance

نویسندگان

چکیده

Introduction In England and Wales, transcripts of police-suspect interviews are often admitted as evidence in courts law. Orthographic transcription is a time-consuming process usually carried out by untrained transcribers, resulting records that contain summaries large sections the interview paraphrased speech. The omission or inaccurate representation important speech content could have serious consequences court It therefore clear investigation into better solutions for police-interview required. This paper explores possibility incorporating automatic recognition (ASR) methods process, with goal producing verbatim without sacrificing police time money. We consider potential viability “first” draft would be manually corrected transcribers. study additionally investigates effects audio quality, regional accent, ASR system used, well types magnitude errors produced their implications context transcripts. Methods Speech data was extracted from two forensically-relevant corpora, speakers accents British English: Standard Southern English West Yorkshire (a non-standard variety). Both high quality degraded version each file transcribed using three commercially available systems: Amazon, Google, Rev. Results System performance varied depending on while accent not found to significantly predict word error rate, distribution substantially across accents, more potentially damaging English. Discussion low rates easily identifiable Amazon suggest incorporation viable, though work required investigate other contextual factors, such multiple different background noise.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporating Contextual Phonetics into Automatic Speech Recognition

This work outlines the problems encountered in modeling pronunciation for automatic speech recognition (ASR) of spontaneous (American) English speech. We detail some of the phonetic phenomena within the Switchboard corpus that make the recognition of this speaking style difficult. Phonetic transcribers found that feature spreading and cue trading made identification of phonetic segmental bounda...

متن کامل

Automatic speech recognition performance on a voicemail transcription task

In this paper, we report on the performance of automatic speech recognition (ASR) systems on voicemail transcription. Voicemail is spontaneous telephone speech recorded over a variety of channels; consequently, it is representative of many challenging problems in speech recognition. In the course of working on this task, several algorithms were developed that focus on different components of an...

متن کامل

Predicting Automatic Speech Recognition Performance

In spoken dialogue systems, it is important for a system to know how likely a speech recognition hypothesis is to be correct, so it can reprompt for fresh input, or, in cases where many errors have occurred, change its interaction strategy or switch the caller to a human attendant. We have discovered prosodic features which more accurately predict when a recognition hypothesis contains a word e...

متن کامل

Efficient Methods for Automatic Speech Recognition

This thesis presents work in the area of automatic speech recognition (ASR). The thesis focuses on methods for increasing the efficiency of speech recognition systems and on techniques for efficient representation of different types of knowledge in the decoding process. In this work, several decoding algorithms and recognition systems have been developed, aimed at various recognition tasks. The...

متن کامل

Incorporating named entity recognition into the speech transcription process

Named Entity Recognition (NER) from speech usually involves two sequential steps: transcribing the speech using Automatic Speech Recognition (ASR) and annotating the outputs of the ASR process using NER techniques. Recognizing named entities in automatic transcripts is difficult due to the presence of transcription errors and the absence of some important NER clues, such as capitalization and p...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Frontiers in Communication

سال: 2023

ISSN: ['2297-900X']

DOI: https://doi.org/10.3389/fcomm.2023.1165233